Feature Subset selection in Medical Data Mining using cascaded GA & CFS: A filter approach
ثبت نشده
چکیده
Medical data mining has enormous potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. Data preprocessing is a significant step in the knowledge discovery process, since quality decisions must be based on quality data. Feature subset selection is one of data preprocessing step, which is of immense importance in the field of data mining. The increased dimensionality of data makes testing and training of general classification method difficult. Feature subset selection reduces the number of attributes appearing in the discovered patterns, helping to make the patterns easier to understand. Further it enhances the classification accuracy and learning runtime this paper presents the development of a filter approach for feature subset selection experimented on the three medical dataset namely Pima Diabetic dataset, Heart Statlog Dataset and Breast cancer dataset available at UCI Machine Learning Repository. The proposed model consists of two stages. In the first stage, a filter approach with genetic algorithm (GA) and Correlation based feature selection has been used in a cascaded fashion. GA rendered global search of attributes with fitness evaluation effected by CFS. The second stage a fine tuned classification is done using three different classifiers namely Naïve Bayes, Bayesian and Radial basis function. Experimental results signify that the feature subset identified by the proposed filter GA+CFS when given as input to Naïve Bayes, Bayesian and Radial basis function classifiers, showed enhanced classification accuracy.
منابع مشابه
Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients (IJEAT)
Medical Data mining is the process of extracting hidden patterns from medical data. This paper presents the development of a hybrid model for classifying Pima Indian diabetic database (PIDD). The model consists of three stages. In the first stage, K-means clustering is used to identify and eliminate incorrectly classified instances. In the second stage Genetic algorithm (GA) and Correlation bas...
متن کاملA stratified sampling technique based on correlation feature selection method for heart disease risk prediction system
In medical, data mining method can be utilized by the physicians to improve clinical diagnosis. In this paper a stratified approach named Correlation Feature Selection Stratified Sampling (CFS-SS) has been introduced. This method is applied to medical diagnosis heart disease risk prediction system. By using this proposed system the attributes are grouped together into homogenous sub groups, bef...
متن کاملA New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum
Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملFilter – GA Based Approach to Feature Selection for Classification
This paper presents a new approach to select reduced number of features in databases. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as and can confuse the process of classification. The proposed method applies filter attribute measure and binary coded Genetic Algorithm to select a small subset of features...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010